Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples
نویسندگان
چکیده
Integrating data involving chemical structures is simplified when unique identifiers (UIDs) can be associated with chemical structures. For example, these identifiers can be used as database keys. One common approach is to use the Unique SMILES notation introduced in [2]. The Unique SMILES views a chemical structure as a graph with atoms as nodes and bonds as edges and uses a depth first traversal of the graph to generate the SMILES strings. The algorithm establishes a node ordering by using certain symmetry properties of the graphs. In this paper, we present certain molecular graphs for which the algorithm fails to generate UIDs. Indeed, we show that different graphs in the same symmetry class employed by the Unique SMILES algorithm have different Unique SMILES IDs. We tested the algorithm on the National Cancer Institute (NCI) database [7] and found several molecular structures for which the algorithm also failed. We have also written a python script that generates molecular graphs for which the algorithm fails.
منابع مشابه
An Empirical Study of the Universal Chemical Key Algorithm for Assigning Unique Keys to Chemical Compounds
In this paper, we introduce an algorithm that assigns an essentially unique key called the Universal Chemical Key (UCK) to molecular structures. The molecular structures are represented as labeled graphs whose nodes abstract atoms and whose edges abstract bonds. The algorithm was tested on 236,917 compounds obtained from the National Cancer Institute (NCI) database of chemical compounds. On thi...
متن کاملExperimental Studies of the Universal Chemical Key (UCK)Algorithm on the NCI Database of Chemical Compounds
We have developed an algorithm called the Universal Chemical Key (UCK) algorithm that constructs a unique key for a molecular structure. The molecular structures are represented as undirected labeled graphs with the atoms representing the vertices of the graph and the bonds representing the edges. The algorithm was tested on 236,917 compounds obtained from the National Cancer Institute (NCI) da...
متن کاملSome Selected Research Accomplishments
• 2003 introduced an algorithm that assigns unique IDs to chemical compounds. In 2003, Kasturi, Hamelberg, Liu and I showed that there is a natural graph algorithm that can assign essentially unique IDs to chemical compounds. We call these unique chemical IDs or UCKs [40]. The basic idea was to define certain classes of natural operations on labeled graphs. For example, one can assign a new (lo...
متن کاملKeys and Armstrong Databases in Trees with Restructuring
The definition of keys, antikeys, Armstrong-instances are extended to complex values in the presence of several constructors. These include tuple, list, set and a union constructor. Nested data structures are built using the various constructors in a tree-like fashion. The union constructor complicates all results and proofs significantly. The reason for this is that it comes along with non-tri...
متن کاملFrustrated Lewis Pair Chemistry: Searching for New Reactions.
Frustrated Lewis pair chemistry has taken a steep development in the recent years. It offers possibilities of developing new variants of known reactions and of finding new chemical transformations. This is demonstrated and described by the recently developed FLP-formylborane chemistry, which has led to the formation of the unique (η2 -formylborane)FLP adducts and opened a way of preparing a gen...
متن کامل